This dashboard shows a handful of data visualization use cases for the Unstructured Data Analytics initiative of the Allegheny County Department of Public Health. This was developed by staff at Chapin Hall and its partners at Argonne National Laboratory.
A simulation of an interactive topic prevalence+trend figure for the case-level visualization
Both sentiment and topical scores can be used for macro planning related to:
For example, this type of table allows a user to filter clients by geographic regions (or other case characteristics), select those with combinations of needs, sort by sentiment, and export this information to share with other role-players.
Both sentiment and topical scores can be used for macro planning related to:
This map identifies geographies with the greatest need for attention by topic. Users are able to pan, zoom, and click for pop-up annotations with additional contextual information.
This helps identify volume and type of needs, along with distribution. With additional layers showing the distribution of service facitilies it can be possible to also see how demand and supply of services can be matched.
---
title: "ACDHS Dashboard Demo"
author: "Chapin Hall"
date: "August 6, 2018"
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source: embed
vertical_layout: fill
logo: img/AGC-triangle-small.png
css: flexdash.css
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warnings = FALSE, error = FALSE)
### Load and--if necessary--install packages ----------------------------------#
package.list <- c("flexdashboard", "data.table", "dplyr", "stringr", "rgdal",
"MASS", "plotly", "leaflet", "ggplot2", "scales", "DT")
for (p in package.list){
if (!p %in% installed.packages()[, "Package"]) install.packages(p)
library(p, character.only = TRUE)
}
### Load custom functions -----------------------------------------------------#
grepv <- function(p, x, ...) grep(p, x, value = TRUE, ...)
cn <- function(x) colnames(x)
```
### Dashboard Demonstration
This dashboard shows a handful of data visualization use cases for the Unstructured Data Analytics initiative of the Allegheny County Department of Public Health. This was developed by staff at Chapin Hall and its partners at Argonne National Laboratory.
- This was made with the free and open-source [R programming language](https://www.r-project.org/about.html)
- Code-driven dashboards like this can be automatically generated to be custom for any given user, and updated with new inputs, analysis, and visualization as part of the full analysis pipeline
- The code and data used to create this example can be found—and reused and contributed to—at [this open-source GitHub repository](https://github.com/chapinhall/acdhs-demo)
- **Note** that all data here is fake, intended only for the purpose of illustration
### Case-Level Visualization {data-commentary-width=500}
***
- This figure combines
- text-mined sentiment score
- text-mined key topics of relevance
- indication of institutional contacts, including DHS investigation and intervention
- Low sentiment scores could be used to trigger specific interventions or referrals to other practitioners, when combined with key topic combinations
- Topic and sentiment scores can help supervisors quickly review timely characteristics of individual cases or a worker's entire caseload
- Persistence of key topics can raise flags to QA teams to complete additional modules, e.g. related to housing concerns
### Interactive Topic Visualization
```{r place interactive topic viz}
load("img/interactive_topic_viz.Rda")
topic_viz
```
***
A simulation of an interactive topic prevalence+trend figure for the case-level visualization
### Macro Planning -- Client Query Table {data-commentary-width=500}
```{r generate macro planning data}
mpn <- 500
Draw1to100 <- function(n, pwr) round(runif(n)^pwr*100, 0)
planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Neptune", "Uranus")
mp <- data.frame(ID = str_pad(round(runif(mpn)*1e7, 0), width = 7, side = "left", pad = "0"),
AQ = paste0(sample(c(2017, 2018), mpn, replace = TRUE),
"-Q",
sample(1:4, mpn, replace = TRUE)),
Region = factor(sample(planets, mpn, replace = TRUE), levels = planets),
Sentiment = Draw1to100(mpn, 1.0),
MH = Draw1to100(mpn, 2.0),
DA = Draw1to100(mpn, 2.0),
DV = Draw1to100(mpn, 2.0),
H = Draw1to100(mpn, 2.0),
E = Draw1to100(mpn, 0.5),
M = Draw1to100(mpn, 1.0),
SC = Draw1to100(mpn, 0.5)) %>%
filter(!AQ %in% c("2018-Q3", "2018-Q4"))
```
```{r place and format dynamic table}
# Develop color scale for quantiles of all top
# /!\ An alternative scheme might just be for 1-10, 11-20, etc as hard divisions
topics <- c("MH", "DA", "DV", "H", "E", "M", "SC")
topic_brks <- quantile(select(mp, one_of(topics)), probs = seq(.05, .95, .05), na.rm = TRUE)
# Develop colors as increasing shades of blue
topic_clrs <-
round(seq(255, 40, length.out = length(topic_brks) + 1), 0) %>%
paste0("rgb(", ., ",", ., ",255)")
topicNames <- c("Mental Health", "Drug/Alcohol", "Domestic Violence", "Housing",
"Education", "Medical", "Social Connection")
sent_brks <- quantile(mp$Sentiment, probs = seq(.05, .95, .05), na.rm = TRUE)
sent_clrs <- colorRampPalette(c("firebrick2", "forestgreen"))(length(sent_brks) + 1)
colnames(mp) <- c("ID", "Assessment Quarter", "Region", "Sentiment", topicNames)
rownames(mp) <- NULL
datatable(mp,
extensions = 'Buttons',
filter = 'top',
rownames = FALSE,
options = list(dom = 'Bfrtip',
buttons = list('copy', 'print', list(
extend = 'collection',
buttons = c('csv', 'excel', 'pdf'),
text = 'Download')),
pageLength = 20)) %>%
formatStyle("Sentiment", backgroundColor = styleInterval(sent_brks, sent_clrs), color = "white") %>%
formatStyle(topicNames, backgroundColor = styleInterval(topic_brks, topic_clrs))
```
***
Both sentiment and topical scores can be used for macro planning related to:
* **Volume** -- who are the children, youth, and families who would best benefit from a new intervention
* **Needs** -- what are the classes of service need within this population
* **Distribution** -- where is the service need located
* **Service Matching** -- how do we match families to the most appropriate services
For example, this type of table allows a user to filter clients by geographic regions (or other case characteristics), select those with combinations of needs, sort by sentiment, and export this information to share with other role-players.
### Macro Planning -- Topic Scores by Geography {data-commentary-width=500}
```{r display zipcode map with topic layers}
# Get zipcode boundary data
zips <- readOGR(dsn = "data", layer = "Allegheny_County_Zip_Code_Boundaries", verbose = FALSE)
# Sample topics:
topics <- c("Pregnant", "Domestic Violence", "Opiates", "Suicide", "Low Income")
sigma <- matrix(c(1.0, 0.2, 0.1, 0.2, 0.4,
0.2, 1.0, 0.3, 0.4, 0.5,
0.1, 0.3, 1.0, 0.4, 0.6,
0.2, 0.4, 0.4, 1.0, 0.4,
0.4, 0.5, 0.6, 0.4, 2.0), nrow = 5)
mu <- c(0.8, 1.5, 0.0, -1.0, 1.0)
# Draw correlated values
set.seed(8062018)
draws <- mvrnorm(n = nrow(zips@data), mu = mu, Sigma = sigma)
# Put this on a 0-100 scale, label, and arrange in order (to give a pattern to matching with zips)
draws <- round((draws - min(draws))/(max(draws) - min(draws))*100, 1)
colnames(draws) <- topics
draws <- arrange(as.data.frame(draws), -`Low Income`)
# Attach values to zipcodes
zips@data <-
cbind(zips@data, draws) %>%
within({
topicPop <- paste0("Topic Scores for Zip ", ZIP, ":
",
"Pregnant: ", sprintf("%1.1f", Pregnant), "
",
"Domestic Violence: ", sprintf("%1.1f", `Domestic Violence`), "
",
"Opiates: ", sprintf("%1.1f", Opiates), "
",
"Suicide: ", sprintf("%1.1f", Suicide), "
",
"Low Income: ", sprintf("%1.1f", `Low Income`), "
",
"More Census data on zip code ", ZIP, "")
geo_label <- paste0(str_to_title(NAME), ", zip: ", ZIP)
})
palBin_blue <- colorBin(palette = "Blues", domain = c(0, 100), bins = 5)
myOpacity <- 0.7
leaflet(data = zips) %>% # width = "100%"
#addTiles() %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(group = "Pregnant", fillColor = ~palBin_blue(Pregnant),
popup = ~topicPop, weight = 2.0, fillOpacity = myOpacity,
label=~geo_label, labelOptions= labelOptions(direction = 'auto'),
highlightOptions = highlightOptions(
color='#ff0000', opacity = 1, weight = 2, fillOpacity = 1,
bringToFront = TRUE, sendToBack = TRUE)) %>%
addPolygons(group = "Domestic Violence", fillColor = ~palBin_blue(`Domestic Violence`),
popup = ~topicPop, weight = 2.0, fillOpacity = myOpacity,
label=~geo_label, labelOptions= labelOptions(direction = 'auto'),
highlightOptions = highlightOptions(
color='#ff0000', opacity = 1, weight = 2, fillOpacity = 1,
bringToFront = TRUE, sendToBack = TRUE)) %>%
addPolygons(group = "Opiates", fillColor = ~palBin_blue(Opiates),
popup = ~topicPop, weight = 2.0, fillOpacity = myOpacity,
label=~geo_label, labelOptions= labelOptions(direction = 'auto'),
highlightOptions = highlightOptions(
color='#ff0000', opacity = 1, weight = 2, fillOpacity = 1,
bringToFront = TRUE, sendToBack = TRUE)) %>%
addPolygons(group = "Suicide", fillColor = ~palBin_blue(Suicide),
popup = ~topicPop, weight = 2.0, fillOpacity = myOpacity,
label=~geo_label, labelOptions= labelOptions(direction = 'auto'),
highlightOptions = highlightOptions(
color='#ff0000', opacity = 1, weight = 2, fillOpacity = 1,
bringToFront = TRUE, sendToBack = TRUE)) %>%
addPolygons(group = "Low Income", fillColor = ~palBin_blue(`Low Income`),
popup = ~topicPop, weight = 2.0, fillOpacity = myOpacity,
label=~geo_label, labelOptions= labelOptions(direction = 'auto'),
highlightOptions = highlightOptions(
color='#ff0000', opacity = 1, weight = 2, fillOpacity = 1,
bringToFront = TRUE, sendToBack = TRUE)) %>%
addLayersControl(baseGroups = c("Pregnant", "Domestic Violence", "Opiates", "Suicide", "Low Income"),
options = layersControlOptions(collapsed = FALSE)) %>%
addLegend(position = "bottomleft", colors = unique(palBin_blue(0:100)),
title = "Topic Scores",
labels = c("0-20", "20-40", "40-60", "60-80", "80-100"))
```
***
Both sentiment and topical scores can be used for macro planning related to:
* **Volume** -- who are the children, youth, and families who would best benefit from a new intervention
* **Needs** -- what are the classes of service need within this population
* **Distribution** -- where is the service need located
* **Service Matching** -- how do we match families to the most appropriate services
This map identifies geographies with the greatest need for attention by topic. Users are able to pan, zoom, and click for pop-up annotations with additional contextual information.
This helps identify volume and type of needs, along with distribution. With additional layers showing the distribution of service facitilies it can be possible to also see how demand and supply of services can be matched.
### Macro Planning -- Trajectory Visualization and Type Identifiation {data-commentary-width=500}
```{r generate prevalence figure to match trajectory viz}
df <-
data.frame(clus = c("Cluster 1", "Cluster 2", "Cluster 3"),
freq = c(5442, 679, 227)) %>%
mutate(pct = freq/sum(freq),
label = paste0("n = ", prettyNum(freq, big.mark = ",")))
myplot <-
ggplot(df, aes(x = clus, y = pct, fill = clus)) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(label = label), vjust = 1) +
scale_y_continuous(labels = percent) +
labs(title = "Prevalence of Cluster Types in Population",
x = "", y = "% of Population") +
theme(legend.position = "none",
axis.title = element_text(size = 12),
axis.text = element_text(size = 11))
ggsave("img/dually-involved-youth_traj-viz_3-cluster--prevalence.png")
```
***
- Trajectory visualizations can show pathways of cases through institutional contacts
- Each individual trajectory is shown as a (narrow) horizontal line. Each subplot shows hundreds of individual cases, with cases sorted by their starting state, then the type and duration of the state that follows.
- These plots show the starting age, duration, sequence, and number of spells for different institutional contacts
- The multiple figures show different distinct cluster patterns. In this example:
- "Cluster 1" represents individuals with short involvements which start at a range of ages
- "Cluster 2" represents individuals with long, unbroken stints in child welfare cases. A third of this group has out-of-home placements, almost always at early ages.
- "Cluster 3"--the smallest group--represents individuals with long stints of out-of home placements, generally starting at early ages.
- These institutional involvement cluster types can also be displayed in tables or maps